wp_kses: add support for picture element and srcset attribute on img tags#6184
wp_kses: add support for picture element and srcset attribute on img tags#6184adamsilverstein wants to merge 30 commits intoWordPress:trunkfrom
Conversation
Test using WordPress PlaygroundThe changes in this pull request can previewed and tested using a WordPress Playground instance. WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser. Some things to be aware of
For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation. |
Co-authored-by: Andrew Ozz <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
There was a problem hiding this comment.
Pull Request Overview
This PR adds support for the HTML5 <picture> element and srcset attribute on <img> tags to WordPress's KSES filtering system. This enables proper handling of responsive images in content filtering.
- Adds
<picture>and<source>elements to allowed post tags - Adds
srcsetandsizesattributes to<img>and<source>elements - Refactors URI sanitization logic to handle multi-URI attributes like
srcset
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| src/wp-includes/kses.php | Adds picture/source elements to allowed tags, adds srcset/sizes attributes, and creates new wp_kses_sanitize_uris function to handle multi-URI attributes |
| tests/phpunit/tests/kses.php | Adds comprehensive test coverage for img srcset attributes, srcset sanitization, and picture element filtering |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
|
@adamsilverstein Can you resolve conflicts on this PR, as the testing team has to apply this patch to QA it? Thanks! |
yes, I will work on that. |
c402114 to
f9e002c
Compare
|
@adamsilverstein thanks for working on this ticket! |
# Conflicts: # src/wp-includes/kses.php
I have resolved the merge conflicts and am looking at adding additional tests. I'll also review feedback on the Trac ticket. Also fine punting this to 7.1 early. |
|
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the Core Committers: Use this line as a base for the props when committing in SVN: To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
Tests currently fail, documenting three bugs: - CDN resizer URLs with commas in paths are incorrectly split by wp_kses_sanitize_uris() naive comma splitting - Original spacing around commas is not preserved - decoding and fetchpriority attrs stripped from img tags Also adds passing coverage tests for wp_kses_uri_attributes, wp_kses_one_attr srcset handling, source/picture element edge cases, and the URI attributes filter hook.
These performance attributes are commonly added by WordPress core (wp_img_tag_add_loading_optimization_attrs) and should not be stripped when content passes through KSES filtering.
CDN image resizers (e.g. Cloudflare) use commas in URL paths like cdn-cgi/image/format=auto,quality=80,width=412/... which were incorrectly split by the naive comma-based preg_split in wp_kses_sanitize_uris(). This rewrites the splitting to use the srcset descriptor pattern (e.g. 480w, 2x) as entry boundaries, preserving commas within URLs. Also preserves original whitespace around separators instead of normalizing with implode.
WordPress coding standards require aligned double arrows in multi-line arrays. Adjusts spacing so all arrows in the data provider align with the longest key.
|
I added some fixes and additional tests. |
Expand test coverage for ticket #29807 with edge cases: CDN URLs with gravity=0.5x0.5 parameters, decimal pixel density descriptors, picture element type fallbacks and art direction, protocol-relative URLs, and descriptor-less final srcset entries. Update @SInCE to 7.1.0 per milestone.
|
Refreshed against current trunk (clean merge, CI green). Given 7.0 RC1 has passed, now targeting 7.1. |
| 'srcset' => true, | ||
| 'usemap' => true, | ||
| 'width' => true, | ||
| 'sizes' => true, |
There was a problem hiding this comment.
This attribute seems it should be moved up to appear before src to maintain alphabetic ordering.
| 'srcset' => true, | ||
| 'type' => true, | ||
| 'media' => true, | ||
| 'sizes' => true, |
| * @param string $attrname The attribute name to test. | ||
| * @param string $attrvalue The attribute value to sanitize. | ||
| * @param string[] $allowed_protocols Array of allowed URL protocols. | ||
| * @param string[] $multi_uri Optional. Attributes that can contain multiple URIs. Default is array( 'srcset' ). | ||
| * @return string Sanitized attribute value. | ||
| */ | ||
| function wp_kses_sanitize_uris( $attrname, $attrvalue, $allowed_protocols, $multi_uri = array( 'srcset' ) ) { | ||
| $uris = wp_kses_uri_attributes(); | ||
|
|
||
| if ( ! in_array( strtolower( $attrname ), $uris, true ) ) { | ||
| return $attrvalue; | ||
| } | ||
|
|
||
| if ( in_array( strtolower( $attrname ), $multi_uri, true ) ) { |
There was a problem hiding this comment.
Let's add underscores to some vars, like $attr_name and $attr_value, and then let's add attrs to others, including $multi_uri_attrs and $uri_attrs. Otherwise, I was reading this and I was confused why attribute names would be among URIs.
| * @param string[] $multi_uri Optional. Attributes that can contain multiple URIs. Default is array( 'srcset' ). | ||
| * @return string Sanitized attribute value. | ||
| */ | ||
| function wp_kses_sanitize_uris( $attrname, $attrvalue, $allowed_protocols, $multi_uri = array( 'srcset' ) ) { |
There was a problem hiding this comment.
I think these can add PHP type hints as well for the params and the return value.
| if ( preg_match( '/^(\s*)(.*?)(\s+\d+[wx])?\s*$/i', $entry, $m ) ) { | ||
| $leading_ws = $m[1]; | ||
| $url = $m[2]; | ||
| $descriptor = isset( $m[3] ) ? $m[3] : ''; |
There was a problem hiding this comment.
| $descriptor = isset( $m[3] ) ? $m[3] : ''; | |
| $descriptor = $m[3] ?? ''; |
| * Instead, split on: whitespace + descriptor + comma (+ optional whitespace). | ||
| * This correctly identifies only the commas that separate srcset entries. | ||
| */ | ||
| $parts = preg_split( '/(\s+\d+[wx]\s*,\s*)/i', $attrvalue, -1, PREG_SPLIT_DELIM_CAPTURE ); |
There was a problem hiding this comment.
| $parts = preg_split( '/(\s+\d+[wx]\s*,\s*)/i', $attrvalue, -1, PREG_SPLIT_DELIM_CAPTURE ); | |
| $parts = array_filter( (array) preg_split( '/(\s+\d+[wx]\s*,\s*)/i', $attrvalue, -1, PREG_SPLIT_DELIM_CAPTURE ) ); |
| for ( $i = 0, $len = count( $parts ); $i < $len; $i++ ) { | ||
| if ( preg_match( '/^\s+\d+[wx]\s*,\s*$/i', $parts[ $i ] ) ) { | ||
| // This is a delimiter: space + descriptor + comma. Append it as-is. | ||
| $result .= $parts[ $i ]; | ||
| } else { | ||
| // This is a URL (possibly with a trailing descriptor for the last entry). | ||
| $entry = $parts[ $i ]; |
There was a problem hiding this comment.
Using foreach can simplify this I believe.
| for ( $i = 0, $len = count( $parts ); $i < $len; $i++ ) { | |
| if ( preg_match( '/^\s+\d+[wx]\s*,\s*$/i', $parts[ $i ] ) ) { | |
| // This is a delimiter: space + descriptor + comma. Append it as-is. | |
| $result .= $parts[ $i ]; | |
| } else { | |
| // This is a URL (possibly with a trailing descriptor for the last entry). | |
| $entry = $parts[ $i ]; | |
| foreach( $parts as $entry ) { | |
| if ( preg_match( '/^\s+\d+[wx]\s*,\s*$/i', $entry ) ) { | |
| // This is a delimiter: space + descriptor + comma. Append it as-is. | |
| $result .= $entry; | |
| } else { | |
| // This is a URL (possibly with a trailing descriptor for the last entry). | |
| if ( preg_match( '/^(\s*)(.*?)(\s+\d+[wx])?\s*$/i', $entry, $m ) ) { |
| } else { | ||
| // This is a URL (possibly with a trailing descriptor for the last entry). | ||
| $entry = $parts[ $i ]; | ||
| if ( preg_match( '/^(\s*)(.*?)(\s+\d+[wx])?\s*$/i', $entry, $m ) ) { |
There was a problem hiding this comment.
Let's use $matches instead of $m to avoid abbreviation.
Trac ticket: https://core.trac.wordpress.org/ticket/29807
This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.